The current status of gene expression profilings in COVID‐19 patients

Abstract Background The global pandemic of coronavirus disease 2019 (COVID‐19) caused by severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has swept through every part of the world. Because of its impact, international efforts have been underway to identify the variants of SARS‐CoV‐2 by genome sequencing and to understand the gene expression changes in COVID‐19 patients compared to healthy donors using RNA sequencing (RNA‐seq) assay. Within the last two and half years since the emergence of SARS‐CoV‐2, a large number of OMICS data of COVID‐19 patients have accumulated. Yet, we are still far from understanding the disease mechanism. Further, many people suffer from long‐term effects of COVID‐19; calling for a more systematic way to data mine the generated OMICS data, especially RNA‐seq data. Methods By searching gene expression omnibus (GEO) using the key terms, COVID‐19 and RNA‐seq, 108 GEO entries were identified. Each of these studies was manually examined to categorize the studies into bulk or single‐cell RNA‐seq (scRNA‐seq) followed by an inspection of their original articles. Results The currently available RNA‐seq data were generated from various types of patients’ samples, and COVID‐19 related sample materials have been sequenced at the level of RNA, including whole blood, different components of blood [e.g., plasma, peripheral blood mononuclear cells (PBMCs), leukocytes, lymphocytes, monocytes, T cells], nasal swabs, and autopsy samples (e.g., lung, heart, liver, kidney). Of these, RNA‐seq studies using whole blood, PBMCs, nasal swabs and autopsy/biopsy samples were reviewed to highlight the major findings from RNA‐seq data analysis. Conclusions Based on the bulk and scRNA‐seq data analysis, severe COVID‐19 patients display shifts in cell populations, especially those of leukocytes and monocytes, possibly leading to cytokine storms and immune silence. These RNA‐seq data form the foundation for further gene expression analysis using samples from individuals suffering from long COVID.


INTRODUCTION
The rise of next-generation sequencing, especially RNA sequencing (RNA-seq) has revolutionized the way we conduct research. Due to the decreased costs of performing RNA-seq experiments, it is now commonly used as the first step of research to profile gene expression changes of one condition compared to another. Through the development of a more elaborate assay, gene expression profiling at the single-cell level is possible, which is collectively called single-cell RNA-seq (scRNA-seq). Instead, the term bulk RNA-seq is used for RNA-seq assay other than scRNA-seq. It is now a common practice and requirement for most journals to deposit the generated RNA-seq data before the publication of each study in a journal. These data are readily available from public domains, such as gene expression omnibus (GEO), ArrayExpress, and Sequence Read Archive (SRA). Such data sharing allows for secondary analysis of the previously published RNA-seq data to discover gene expression changes from a different perspective than originally intended by combining two or more similar studies.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative virus for the global pandemic, coronavirus disease 2019 (COVID-19). Because of its global impact, numerous approaches, especially those using high-throughput OMICS techniques, have been taken to characterise the genomic mutations of this virus as well as the impact on the COVID-19 patients, especially using RNA-seq assay. Due to the rapid mutations of RNA viruses, SARS-CoV-2 has mutated by acquiring more aggressive infection rates in humans. 1 These mutations are closely monitored by performing genomic sequencing of COVID-19 patients around the world. Although various mutations and dominant variants of SARS-CoV-2 have been identified, the symptoms and severity of COVID-19 patients vary significantly depending, in part, on underlying conditions (e.g., older ages, diabetes, obesity, gender). 2 The symptoms of COVID-19 are diverse, especially those suffering from long-term symptoms. The specific term, long COVID, has been developed to describe those suffering for more than 3 months. 3 The current findings indicate that the damage to endothelial and nerve cells might be responsible for the short-and long-term COVID symptoms, which result in damage to the lungs, heart, brain, and other vital organs of those infected. 4,5 With the appearance of the less life-threatening variant of SARS-CoV-2, the BA.2 variant (so-called stealth omicron 6 ), it is clear that the research has shifted to understanding the chronic complications of COVID-19, including chest pain, cough, fatigue, headaches, joint pain, loss of smell or tastes, and shortness of breath. Due to the global interest to elucidate the disease mechanism, many data have been collected, including OMICS data. Yet, one is still far from understanding the whole spectrum of the negative impact of SARS-CoV-2 on human health as in the case of possible causative contribution to the rise of mysterious hepatitis in children in recent weeks. 7 Thus, it is clear that more systematic approaches are urgently needed to understand the impact of long COVID. To facilitate such approaches, this Mini-Review surveys the current status of gene expression profilings of COVID-19 patients using the RNA-seq technique.

PUBLICLY AVAILABLE RNA-SEQ DATA OF COVID-19 PATIENTS AND COVID-RELATED RESEARCH
To screen for genes affected by SARS-CoV-2 and possibly responsible for the symptoms of COVID-19 patients, both bulk RNA-seq and scRNA-seq techniques have been used. Because of the global impact of COVID-19, various types of patients' samples and COVID-19 related sample materials have been sequenced at the level of RNA, including whole blood, different components of blood [e.g., plasma, peripheral blood mononuclear cells (PBMCs), leukocytes, lymphocytes, monocytes, T cells], nasal swabs, and autopsy samples (e.g., lung, heart, liver, kidney) (Table 1).

Whole blood
The drawing of blood is a standard medical practice to diagnose various diseases. Thus, it is no surprise that many RNA-seq data of whole blood of COVID-19 patients compared to that of healthy donors are available. For example, the analysis of RNA-seq data of whole blood from 42 severe hospitalized COVID-19 patients compared to 10 healthy donors shows that 4 079 genes are differentially expressed at the threshold of 1.5-fold change. 8 Not surprisingly, many genes involved in immune response (e.g., neutrophil and interferon signalling, T and B cell receptor responses) are differentially regulated, especially CD177, a marker of neutrophil activation. Another study comparing RNA-seq data  9 Compared to people with underlying conditions, many people infected with SARS-CoV-2 are asymptomatic. Thus, RNA-seq data of COVID-19 Health Action Response for Marines (CHARM) study is of great interest because it collected whole blood from 475 subjects at different time points during SARS-Cov-2 initial outbreak and later surveillance on the United States Marine recruits. 10 Yet, the original publication of this study did not explore the RNA-seq data in detail. This is because the study concentrated more on proteomic analysis, which identified the elevated level of serum IL-17C in asymptomatic participants compared to those with COVID-19 symptoms ( Figure 1). As this study generated time-course 1 858 RNAseq data, re-analysis of RNA-seq data will be of great interest to further elucidate the gene expression changes associated with COVID-19 symptoms.

Peripheral blood mononuclear cells
Besides whole blood, different components of blood were used to perform RNA-seq assay. A PBMC is any blood cell having round nucleus, including lymphocytes [T cells, B

cells, natural killer (NK) cells] and monocytes. 11,12 Because
PMBCs include different immune cell types, scRNA-seq assay is employed to decipher transcriptome dynamics and cell-type differences in COVID-19 patients compared to healthy donors. For example, the analysis of scRNAseq data of PBMCs collected from 11 healthy donors, 5 asymptomatic individuals, 33 individuals with moderate COVID-19 symptoms, 10 individuals with severe COVID-19 symptoms, and two time-point data of two individuals with severe COVID-19 symptoms identified 76 cell subpopulations associated with various clinical presentations of COVID-19 patients, 13 highlighting the complicated celltype landscapes of COVID-19 symptoms. Although such identification of cell subpopulations is important, further follow-up studies focusing on the functionalities of these subpopulations of cells are necessary. Across all age groups, males have a higher rate of respiratory intubation, a longer length of hospital stay, and a higher death rate from COVID-19 compared to females. 14 To address the gender differences in COVID-19 patients, scRNA-seq combined with flow cytometry analysis of 10 healthy donors, 9 COVID-19 inpatients (hospitalized), 19 outpatients (infected), and 7 uninfected close contacts (exposed) show that circulating mucosalassociated invariant T (MAIT) cells were recruited to airway tissues more robustly in female COVID-19 patients compared to male COVID-19 patients as circulating MAIT cells are higher in frequencies in females than males in the healthy setting. 15 Interestingly, this study identified In contrast, MAITβ cells are defined as pro-apoptotic based on the enriched expressions of genes categorized under cellular responses to external stimuli, metabolism of RNA, viral infection, and programmed cell death but not immune processes. In the healthy setting, MAITα cells are dominant in females, while MAITβ cells are dominant in males. Based on these findings, the authors conclude that female-specific protective MAIT subpopulation might be responsible for the reduced severity of COVID-19 symptoms and death.
Although COVID-19 vaccines have been developed to reduce the mortality rate, the effective treatment of COVID-19 patients is still lacking. Up until now, some therapeutic approaches have been taken. One of such is the usage of Tocilizumab (Actemra), which is an immunosuppressive drug targeting IL6. Using time-course scRNA-seq experiment of severe COVID-19 patients treated with Tocilizumab, it was found that a subpopulation of monocytes contributes to the inflammatory cytokine storms of severe COVID-19 patients. This monocyte subpopulation expresses CCL3, IL6, IL10, TNF, inflammationrelated chemokine genes (CCL4, CCL20, CXCL2, CXCL3, CCL3L1, CCL4L2, CXCL8, and CXCL9), and inflammasome activation-associated genes (NLRP3 and IL1B). 16 Further, humoral and cell-mediated antiviral immune responses were sustained even upon treatment with Tocilizumab, suggesting that further treatment targeting these cell populations is needed for COVID-19-related cytokine storms ( Figure 2B).

Nasal swabs
Nasal or nasopharyngeal swabs are a common method to test for the presence of SARS-CoV-2. Besides detecting fragments of viral RNA, genome-wide transcriptomic analysis of the host (i.e., COVID-19 patients) can be performed. For example, by comparing RNA-seq data generated from naso/oropharyngeal swabs of 36 COVID-19 Indian patients hospitalized during the first surge of COVID-19 to those of 5 COVID-19 negative control samples, 251 up-and 9 068 down-regulated genes were identified at the threshold of two-fold changes and adjusted p-value < .05. 17 The differentially expressed genes include up-regulation of genes involved in innate immune response (e.g., interferon signalling, response to virus) and down-regulation of genes involved in membrane potentials and neurotransmitter transport as well as cardiac, muscular, and neurological processes, suggesting that significant down-regulation of host transcriptomes can be monitored via nasal swabs. By performing RNA-seq assay of whole blood and/or nasopharyngeal swabs of COVID-19 patients compared to healthy donors and individuals with other viral acute respiratory infections (i.e., influenza or seasonal coronavirus infection) or non-viral acute respiratory illness (i.e., bacterial sepsis) (a total of 404 bulk RNA-seq data), the activation of interferon-mediated antiviral pathways and inhibition of other immune and inflammatory pathways (e.g., nuclear factor κB, TREM1, NK cell signalling pathways) were identified, suggesting an overall dysregulated immune response in COVID-19 patients. 18 This study is particularly interesting as COVID-19 specific gene expression changes were inferred by comparing it to other infectious diseases.

Autopsy and biopsy samples
It is now clear that the first response to the infection of SARS-CoV-2 is through innate immune responses, leading to strong and dysregulated inflammatory responses and prolonged effects in various tissues. 19 Thus, gene expression profilings of autopsy samples from COVID-19 patients are informative in understanding the prolonged effects of SARS-CoV-2 on the human body. By developing a COVID-19 autopsy biobank consisting of 11 organs and 17 donors, scRNA-seq experiment was performed to profile 24 lungs, 16 kidneys, 16 liver and 19 heart autopsy tissues of individuals who passed away from COVID-19. 20 Through the detailed analysis of these data, the authors uncovered altered cellular compartments, especially in lungs, where defect in alveolar type 2 differentiation was recorded. This study provides a valuable source of autopsy samples as well as OMICS data, including bulk RNA-seq, scRNAseq, and single-nucleus RNA-seq data. It would be of interest to compare these data to other RNA-seq data of autopsy samples [21][22][23] (Table 1) to identify common defects in tissue regeneration in COVID-19 patients in regards to dysregulated signalling pathways. Without a doubt, the lungs are the most affected organ by SARS-CoV-2. Thus, intensive research focusing on gene expression profilings in lungs has been conducted. For example, scRNA-seq data of bronchoalveolar lavage fluids (BAL) and matched peripheral blood samples from 21 severe COVID-19 patients admitted to intensive care units (ICU) and on peripheral blood of 6 mild COVID-19 patients and 5 healthy donors show that the severe COVID-19 patients had a higher proportion of neutrophils and decreased proportion of lymphocytes in their blood samples compared to other two sample groups. 24 In BAL, the gene expressions of pro-inflammatory M1 macrophages [characterized by the expression of SPP1 (osteopontin)] were induced and associated with a better prognosis for severe COVID-19 patients (Figure 3). Based on these data, the authors conclude that immune silence in severe COVID-19 patients may stem from myeloid dysregulation and lymphoid impairment. Just as with any other scRNAseq study, further follow-up studies with more functional and mechanistic studies of the identified subpopulations of cells are necessary to firmly establish the observations made by scRNA-seq data analysis.

CONCLUSION
The longitudinal cohort study of COVID-19 patients who had survived hospitalization indicates that even two years after discharge from Jin Yin-tan Hospital (Wuhan, China), survivors with long COVID symptoms had a lower healthrelated quality of life (HRQoL), worse exercise capacity, more mental health abnormality, and increased healthcare use after discharge compared to those without long COVID symptoms. 25 This indicates that mechanistic understanding of long-term effects of COVID-19 is urgently needed. To this end, more RNA-seq data should be generated from individuals with long COVID symptoms. Such newly generated data can be compared to the previously generated data as listed in Table 1 to perform a comparative analysis of transcriptomic data to understand how gene expression changes affect COVID-19 patients. There are some studies already published that performed secondary analysis of previously generated RNA-seq and microarray data of COVID-19 patients compared to healthy donors and individuals with other illnesses [e.g., SARS and the Middle East respiratory syndrome (MERS), lupus]. [26][27][28][29] Yet, to understand the disease mechanism of SARS-CoV-2, RNA-seq data from long-COVID patients should be generated not only from blood or blood-related materials but also from tissue biopsy samples from the affected areas by SARS-CoV-2. Furthermore, more systematic analysis of RNA-seq data combined with other OMICS data (e.g., genomics, proteomics, metabolomics), especially those of time-course data, are urgently needed. These data should be analysed not only for gene expression changes but also for gene regulatory networks as well as using machine learning algorithms to train and predict the early diagnostic biomarkers of long COVID. It is also important to note that gene expression changes should be verified with protein expressions, including proteomics and fluorescence-activated cell sorting (FACS) analysis. Such combined approaches will help to understand the disease mechanisms of SARS-CoV-2 causing long COVID.

C O N F L I C T O F I N T E R E S T
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.